AITopics | accuracy metric

Collaborating Authors

accuracy metric

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Time-Warping Recurrent Neural Networks for Transfer Learning

Hirschi, Jonathon

arXiv.org Machine LearningApr-6-2026

Dynamical systems describe how a physical system evolves over time. Physical processes can evolve faster or slower in different environmental conditions. We use time-warping as rescaling the time in a model of a physical system. This thesis proposes a new method of transfer learning for Recurrent Neural Networks (RNNs) based on time-warping. We prove that for a class of linear, first-order differential equations known as time lag models, an LSTM can approximate these systems with any desired accuracy, and the model can be time-warped while maintaining the approximation accuracy. The Time-Warping method of transfer learning is then evaluated in an applied problem on predicting fuel moisture content (FMC), an important concept in wildfire modeling. An RNN with LSTM recurrent layers is pretrained on fuels with a characteristic time scale of 10 hours, where there are large quantities of data available for training. The RNN is then modified with transfer learning to generate predictions for fuels with characteristic time scales of 1 hour, 100 hours, and 1000 hours. The Time-Warping method is evaluated against several known methods of transfer learning. The Time-Warping method produces predictions with an accuracy level comparable to the established methods, despite modifying only a small fraction of the parameters that the other methods modify.

artificial intelligence, machine learning, prediction, (20 more...)

arXiv.org Machine Learning

2604.02474

Country:

North America > United States > Colorado > Denver County > Denver (0.14)
North America > United States > Oklahoma (0.06)
North America > United States > Rocky Mountains (0.04)
(15 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

BIG5-TPoT: Predicting BIG Five Personality Traits, Facets, and Items Through Targeted Preselection of Texts

Le, Triet M., Chandra, Arjun, Rytting, C. Anton, Karuzis, Valerie P., Rife, Vladimir, Simpson, William A.

arXiv.org Artificial IntelligenceNov-13-2025

Predicting an individual's personalities from their generated texts is a challenging task, especially when the text volume is large. In this paper, we introduce a straightforward yet effective novel strategy called targeted preselection of texts (TPoT). This method semantically filters the texts as input to a deep learning model, specifically designed to predict a Big Five personality trait, facet, or item, referred to as the BIG5-TPoT model. By selecting texts that are semantically relevant to a particular trait, facet, or item, this strategy not only addresses the issue of input text limits in large language models but also improves the Mean Absolute Error and accuracy metrics in predictions for the Stream of Consciousness Essays dataset.

facet, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2511.09426

Country: Asia (0.28)

Genre:

Research Report (0.82)
Overview (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Elements of Active Continuous Learning and Uncertainty Self-Awareness: a Narrow Implementation for Face and Facial Expression Recognition

Selitskiy, Stanislav

arXiv.org Artificial IntelligenceNov-11-2025

Reflection on one's thought process and making corrections to it if there exists dissatisfaction in its performance is, perhaps, one of the essential traits of intelligence. However, such high-level abstract concepts mandatory for Artificial General Intelligence can be modelled even at the low level of narrow Machine Learning algorithms. Here, we present the self-awareness mechanism emulation in the form of a supervising artificial neural network (ANN) observing patterns in activations of another underlying ANN in a search for indications of the high uncertainty of the underlying ANN and, therefore, the trustworthiness of its predictions. The underlying ANN is a convolutional neural network (CNN) ensemble employed for face recognition and facial expression tasks. The self-awareness ANN has a memory region where its past performance information is stored, and its learnable parameters are adjusted during the training to optimize the performance. The trustworthiness verdict triggers the active learning mode, giving elements of agency to the machine learning algorithm that asks for human help in high uncertainty and confusion conditions.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-19907-3_38

2511.05574

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Education > Educational Setting > Continuing Education (0.42)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

R1, R2, R4: Suggest more extensive analysis on Assumption 2 and the normalization step in Algorithm 1

Neural Information Processing SystemsAug-15-2025, 16:38:24 GMT

We would like to thank the reviewers for their insightful feedback. In the following, we address their key concerns. Following reviewers' suggestions, we will add more thorough analysis in the final paper. Its advantages and applications are then limited. Mixup was introduced in VPU as a regularizer to solve the overfitting problem (Table 4 and Lines 100-105, 376-384).

artificial intelligence, machine learning, mixup, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.72)

Add feedback

Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions

Anzenberg, Eitan, Samajpati, Arunava, Chandrasekar, Sivasankaran, Kacholia, Varun

arXiv.org Artificial IntelligenceJul-29-2025

The use of large language models (LLMs) in hiring promises to streamline candidate screening, but it also raises serious concerns regarding accuracy and algorithmic bias where sufficient safeguards are not in place. In this work, we benchmark several state-of-the-art foundational LLMs - including models from OpenAI, Anthropic, Google, Meta, and Deepseek, and compare them with our proprietary domain-specific hiring model (Match Score) for job candidate matching. We evaluate each model's predictive accuracy (ROC AUC, Precision-Recall AUC, F1-score) and fairness (impact ratio of cut-off analysis across declared gender, race, and intersectional subgroups). Our experiments on a dataset of roughly 10,000 real-world recent candidate-job pairs show that Match Score outperforms the general-purpose LLMs on accuracy (ROC AUC 0.85 vs 0.77) and achieves significantly more equitable outcomes across demographic groups. Notably, Match Score attains a minimum race-wise impact ratio of 0.957 (near-parity), versus 0.809 or lower for the best LLMs, (0.906 vs 0.773 for the intersectionals, respectively). We discuss why pretraining biases may cause LLMs with insufficient safeguards to propagate societal biases in hiring scenarios, whereas a bespoke supervised model can more effectively mitigate these biases. Our findings highlight the importance of domain-specific modeling and bias auditing when deploying AI in high-stakes domains such as hiring, and caution against relying on off-the-shelf LLMs for such tasks without extensive fairness safeguards. Furthermore, we show with empirical evidence that there shouldn't be a dichotomy between choosing accuracy and fairness in hiring: a well-designed algorithm can achieve both accuracy in hiring and fairness in outcomes.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.02087

Genre: Research Report > New Finding (1.00)

Industry:

Law (0.94)
Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AutomataGPT: Forecasting and Ruleset Inference for Two-Dimensional Cellular Automata

Berkovich, Jaime A., David, Noah S., Buehler, Markus J.

arXiv.org Artificial IntelligenceJun-24-2025

Cellular automata (CA) provide a minimal formalism for investigating how simple local interactions generate rich spatiotemporal behavior in domains as diverse as traffic flow, ecology, tissue morphogenesis and crystal growth. However, automatically discovering the local update rules for a given phenomenon and using them for quantitative prediction remains challenging. Here we present AutomataGPT, a decoder-only transformer pretrained on around 1 million simulated trajectories that span 100 distinct two-dimensional binary deterministic CA rules on toroidal grids. When evaluated on previously unseen rules drawn from the same CA family, AutomataGPT attains 98.5% perfect one-step forecasts and reconstructs the governing update rule with up to 96% functional (application) accuracy and 82% exact rule-matrix match. These results demonstrate that large-scale pretraining over wider regions of rule space yields substantial generalization in both the forward (state forecasting) and inverse (rule inference) problems, without hand-crafted priors. By showing that transformer models can faithfully infer and execute CA dynamics from data alone, our work lays the groundwork for abstracting real-world dynamical phenomena into data-efficient CA surrogates, opening avenues in biology, tissue engineering, physics and AI-driven scientific discovery.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2506.17333

Country:

Europe (0.67)
North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Law > Intellectual Property & Technology Law (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Government > Regional Government (0.46)
Health & Medicine > Health Care Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.67)

Add feedback

Lexicographic optimization-based approaches to learning a representative model for multi-criteria sorting with non-monotonic criteria

Zhang, Zhen, Li, Zhuolin, Yu, Wenyu

arXiv.org Artificial IntelligenceSep-3-2024

Deriving a representative model using value function-based methods from the perspective of preference disaggregation has emerged as a prominent and growing topic in multi-criteria sorting (MCS) problems. A noteworthy observation is that many existing approaches to learning a representative model for MCS problems traditionally assume the monotonicity of criteria, which may not always align with the complexities found in real-world MCS scenarios. Consequently, this paper proposes some approaches to learning a representative model for MCS problems with non-monotonic criteria through the integration of the threshold-based value-driven sorting procedure. To do so, we first define some transformation functions to map the marginal values and category thresholds into a UTA-like functional space. Subsequently, we construct constraint sets to model non-monotonic criteria in MCS problems and develop optimization models to check and rectify the inconsistency of the decision maker's assignment example preference information. By simultaneously considering the complexity and discriminative power of the models, two distinct lexicographic optimization-based approaches are developed to derive a representative model for MCS problems with non-monotonic criteria. Eventually, we offer an illustrative example and conduct comprehensive simulation experiments to elaborate the feasibility and validity of the proposed approaches.

approach 1, criteria, marginal value function, (11 more...)

arXiv.org Artificial Intelligence

2409.01612

Country:

Asia > China > Liaoning Province > Dalian (0.04)
South America > Brazil (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Add feedback

Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks

Pimentel, Marco AF, Christophe, Clément, Raha, Tathagata, Munjal, Prateek, Kanithi, Praveen K, Khan, Shadab

arXiv.org Artificial IntelligenceJul-28-2024

As large language models (LLMs) continue to evolve, the need for robust and standardized evaluation benchmarks becomes paramount. Evaluating the performance of these models is a complex challenge that requires careful consideration of various linguistic tasks, model architectures, and benchmarking methodologies. In recent years, various frameworks have emerged as noteworthy contributions to the field, offering comprehensive evaluation tests and benchmarks for assessing the capabilities of LLMs across diverse domains. This paper provides an exploration and critical analysis of some of these evaluation methodologies, shedding light on their strengths, limitations, and impact on advancing the state-of-the-art in natural language processing.

dataset, language model, likelihood, (15 more...)

arXiv.org Artificial Intelligence

2407.21072

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > Middle East > Yemen > Amran Governorate > Amran (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

kNN Classification of Malware Data Dependency Graph Features

Musgrave, John, Ralescu, Anca

arXiv.org Artificial IntelligenceJul-5-2024

Explainability in classification results are dependent upon the features used for classification. Data dependency graph features representing data movement are directly correlated with operational semantics, and subject to fine grained analysis. This study obtains accurate classification from the use of features tied to structure and semantics. By training an accurate model using labeled data, this feature representation of semantics is shown to be correlated with ground truth labels. This was performed using non-parametric learning with a novel feature representation on a large scale dataset, the Kaggle 2015 Malware dataset. The features used enable fine grained analysis, increase in resolution, and explainable inferences. This allows for the body of the term frequency distribution to be further analyzed and to provide an increase in feature resolution over term frequency features. This method obtains high accuracy from analysis of a single instruction, a method that can be repeated for additional instructions to obtain further increases in accuracy. This study evaluates the hypothesis that the semantic representation and analysis of structure are able to make accurate predications and are also correlated to ground truth labels. Additionally, similarity in the metric space can be calculated directly without prior training. Our results provide evidence that data dependency graphs accurately capture both semantic and structural information for increased explainability in classification results.

accuracy, classifier, dataset, (14 more...)

arXiv.org Artificial Intelligence

2406.02654

Country:

North America > United States > Ohio > Wayne County > Wooster (0.14)
North America > United States > Ohio > Montgomery County > Dayton (0.04)
North America > United States > Ohio > Hamilton County > Cincinnati (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.82)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Add feedback

Rethinking Entity-level Unlearning for Large Language Models

Ma, Weitao, Feng, Xiaocheng, Zhong, Weihong, Huang, Lei, Ye, Yangfan, Qin, Bing

arXiv.org Artificial IntelligenceJun-22-2024

Large language model unlearning has gained increasing attention due to its potential to mitigate security and privacy concerns. Current research predominantly focuses on Instance-level unlearning, specifically aiming at forgetting predefined instances of sensitive content. However, a notable gap still exists in exploring the deletion of complete entity-related information, which is crucial in many real-world scenarios, such as copyright protection. To this end, we propose a novel task of Entity-level unlearning, where the entity-related knowledge within the target model is supposed to be entirely erased. Given the challenge of practically accessing all entity-related knowledge within a model, we begin by simulating entity-level unlearning scenarios through fine-tuning models to introduce pseudo entities. Following this, we develop baseline methods inspired by trending unlearning techniques and conduct a detailed comparison of their effectiveness in this task. Extensive experiments reveal that current unlearning algorithms struggle to achieve effective entity-level unlearning. Additionally, our analyses further indicate that entity-related knowledge injected through fine-tuning is more susceptible than original entities from pre-training during unlearning, highlighting the necessity for more thorough pseudo-entity injection methods to make them closer to pre-trained knowledge.

arxiv preprint arxiv, evaluation, knowledge, (15 more...)

arXiv.org Artificial Intelligence

2406.15796

Country:

Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback